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Abstract — THIS PAPER IS ELIGIBLE FOR THE STUDENT 
PAPER AWARD. It has been shown recently that the dirty- 
paper coding is the optimal strategy for maximizing the sum rate 
of multiple-input multiple-output Gaussian broadcast channels 
(MIMO BC). Moreover, by the channel duality, the nonconvex 
MIMO BC sum rate problem can be transformed to the convex 
dual MIMO multiple-access channel (MIMO MAC) problem with 
a sum power constraint. In this paper, we design an efficient 
algorithm based on conjugate gradient projection (CGP) to solve 
the MIMO BC maximum sum rate problem. Our proposed CGP 
algorithm solves the dual sum power MAC problem by utilizing 
the powerful concept of Hessian conjugacy. We also develop a 
rigorous algorithm to solve the projection problem. We show 
that CGP enjoys provable convergence, nice scalability, and great 
efficiency for large MIMO BC systems. 

I. Introduction 

Recently, researchers have shown great interests in charac- 
terizing the capacity region for multiple-input multiple-output 
broadcast channels (MIMO BC) and MIMO multiple-access 
channels (MIMO MAC). In particular, although the general 
capacity region for MIMO BC remains an open problem [1], 
the sum rate region has been shown achievable by the dirty- 
paper coding strategy [2], [3]. Moreover, by the remarkable 
channel duality between MIMO BC and MIMO MAC estab- 
lished in [4]-[6], the nonconvex MIMO BC sum rate problem 
can be transformed to the convex dual MIMO MAC problem 
with a sum power constraint. 

However, although the standard interior point convex op- 
timization method can be used to solve the sum power 
MIMO MAC problem, its complexity is considerably higher 
than those methods that exploit the special structure of the 
sum power MIMO MAC problem. Such specifically designed 
algorithms include the minimax method (MM) by Lan and 
Yu [7], the steepest descent (SD) method by Viswanathan et 
al. [8], the dual decomposition (DD) method by Yu [9], and 
two iterative water-filling methods (IWFs) by Jindal et al. [10]. 
Among these algorithms, MM is more complex than the others 
having the linear complexity. SD and DD have longer running 
time per iteration than IWFs due to line searches and the inner 
optimization, respectively. Both IWFs in [10], however, do not 
scale well as the number of users, denoted by K, increases. 
The reason is that in each iteration of IWFs, the most recently 
updated solution only accounts for a fraction of 1/K in the 
effective channels' computation. The authors of [10] proposed 



a hybrid algorithm as a remedy. But the hybrid algorithm 
introduces additional complexity in implementation and its 
performance depends upon the empirical switch timing, which, 
in turn, depends upon specific problems. In addition, one of 
the IWFs in [10], although converges relatively faster than the 
other one, requires a total storage size for K 2 input covariance 
matrices. These limitations of the existing algorithms motivate 
us to design an efficient and scalable algorithm with a modest 
storage requirement for solving large MIMO BC systems. 

Our major contribution in this paper is that we design a 
fast algorithm based on Conjugate Gradient Projection (CGP) 
approach. Our algorithm is inspired by [11], where a gradient 
projection method was used to heuristically solve another 
nonconvex maximum sum rate problem for single-hop MIMO- 
based ad hoc networks with mutual interference. However, 
unlike [1 1], we use the conjugate gradient directions instead of 
gradient directions to eliminate the "zigzagging" phenomenon 
encountered in [11]. Also, we develop a rigorous algorithm 
to exactly solve the projection problem (in [11], the way 
of handling gradient projection is based on heuristic: The 
authors simply set the first derivative to zero to get the solution 
when solving the constrained Lagrangian dual of the projection 
problem). The attractive features of our proposed CGP are as 
follows: 

1) CGP is extremely fast, and enjoys provable convergence 
as well as nice scalability. As opposed to IWFs, the 
number of iterations required for convergence in CGP is 
very insensitive to the increase of the number of users. 

2) CGP has the desirable linear complexity. By adopting 
the inexact line search method called 'Armijo's Rule", 
we show that CGP has a comparable complexity to 
IWFs per iteration, and requires much fewer iterations 
for convergence in large MIMO BC systems. 

3) CGP has a modest memory requirement: It only needs 
the solution information from the previous step, as 
opposed to one of the IWFs, which requires the solution 
information from previous K — 1 steps. Moreover, CGP 
is very intuitive and easy to implement. 

The remainder of this paper is organized as follows. In 
Section II, we discuss the network model and formulation. 
Section III introduces the key components in our CGP frame- 
work, including conjugate gradient computation and how to 



perform projection. We analyze and compare the complexity of 
CGP with other existing algorithms in Section IV. Numerical 
results are presented in Section V. Section VI concludes this 
paper. 

II. System Model and Problem Formulation 

We first introduce notation. We use boldface to denote 
matrices and vectors. For a complex-valued matrix A, A* 
and A 1 * denotes the conjugate and conjugate transpose of A, 
respectively. Tr{A} denotes the trace of A. We let I denote 
the identity matrix with dimension determined from context. 
A y represents that A is Hermitian and positive semidefinite 
(PSD). DiagjAi . . . A„} represents the block diagonal matrix 
with matrices Ai , . . . , A„ on its main diagonal. 

Suppose that a MIMO Gaussian broadcast channel has K 
users, each of which is equipped with n r antennas, and the 
transmitter has n t antennas. The channel matrix for user i is 
denoted as H, G O xn '. 

In [2], [4]-[6], it has been shown that the maximum sum 
rate capacity of MIMO BC is equal to the dirty-paper coding 
region, which can be computed by solving the optimization 
problem as follows: 



Maximize T K log Ml+gjlgkjliK 
Maximize ^ i=1 log dot(l+Hi(E j-i r . )H t 

subject to Ti >z 0, i = l,2,...,K 



(1) 



Er =1 Tr(r,)<P, 



where I\ 6 C" tX "*, i = 1, . . . , K, are the downlink input 
covariance matrices. It is evident that (1) is a nonconvex 
optimization problem. However, the authors in [4], [6] showed 
that due to the duality between MIMO BC and MIMO MAC, 
(1) is equivalent to the following MIMO MAC problem with 
a sum power constraint: 



Maximize log det (i + J2? =1 hJQ.H, 
subject to Qi y 0, i = 1, 2, . . . , K 



(2) 



£tiTr(Q,)<P, 



where G C nrXrir , i = 1, . . . , K are the uplink in- 
put covariance matrices. For convenience, we use the ma- 
trix Q = [ Qi Q2 • • • Qk ] to denote the set of 
all uplink input covariance matrices, and let P(Q) = 
log dot ^1 + £iLi HjQiH^ represent the objective function 
of (2). After solving (2), we can recover the solutions of (1) 
by the mapping proposed in [4]. 

III. Conjugate Gradient Projection for MIMO BC 

In this paper, we propose an efficient algorithm based on 
conjugate gradient projection (CGP) to solve (2). CGP utilizes 
the important and powerful concept of Hessian conjugacy to 
deflect the gradient direction so as to achieve the superlin- 
ear convergence rate [12] similar to that of the well-known 
quasi-Newton methods (e.g., BFGS method). Also, gradient 
projection is a classical method originally proposed by Rosen 
[13] aiming at solving constrained nonlinear programming 
problems. But its convergence proof has not been established 



until very recently [12]. The framework of CGP for solving 
(2) is shown in Algorithm 1. 

Algorithm 1 Gradient Projection Method 
Initialization: 

Choose the initial conditions Q<°) = [Q< 0) , Q< 0) , . . . , Q^] T . Let 
k = 0. 
Main Loop: 



(fc) 



1,2, 



,K. 



1. Calculate the conjugate gradients 

' (k) (k) 

2. Choose an appropriate step size s k . Let Q 4 = 



- s k G 



(k) 



for i= 1,2,..., K. 
3. Let QW be the projection of Q'W onto fi+(P), where Cl+(P) = 
{Qi, i = l,.. -,K\Qi y 0, E,=i Tr{Qi} < P}. 



(fc+i) 



Q 



(k) 



4. Choose appropriate step size a k . Let Q 
qW),* = 1,2,...,A-. 

5. k = k + 1. If the maximum absolute value of the elements in Q 



(fc) 



(fc)_ 



Qf X) <e, fori = 1,2,. 



, L, then stop; else go to step 1 . 



Due to the complexity of the objective function in (2), 
we adopt the inexact line search method called "Armijo's 
Rule" to avoid excessive objective function evaluations, while 
still enjoying provable convergence [12]. The basic idea of 
Armijo's Rule is that at each step of the line search, we 
sacrifice accuracy for efficiency as long as we have sufficient 
improvement. According to Armijo's Rule, in the k th iteration, 
we choose Uk = 1 and — (3 mk (the same as in [11]), where 
rrik is the first non-negative integer m that satisfies 

F(Q( fe+1 )) - F(Q^) > cr/? m (G (fc) ,Q (fe) - Q( fc )> 

K 



a^^TrfGp) (Qf-Qf) 

»=i 



(3) 



where < (3 < 1 and < a < 1 are fixed scalars. 

Next, we will consider two major components in the CGP 
framework: 1) how to compute the conjugate gradient direction 
Gi, and 2) how to project Q ^ onto the set f2+(P) = 
{Q 4 , i = l,...,K\QihO, Eti Tr {QJ < P}- 
A. Computing the Conjugate Gradients 

The gradient = Vq i P(Q) depends on the partial 
derivatives of P(Q) with respect to By using the formula 

81ndct(A+BXC) = + BXC )-1 B ] T [U]) [14]> we can 

compute the partial derivative of P(Q) with respect to Qj as 
follows (by letting A = I + v| x , , , H;Q ; 1I,. B - Hj, 
X = Qi, and C = H;): 



dF(Q) 



d 



K 



log det I + ^HjQjHj 



K 



i=i 



(4) 



Further, from the definition V z f(z) = 2(df(z)/dz)* [15], we 
have 

fdF(Q)\ 



K 



G 4 = 2 



V dQ, J 



= 2H, [i + ^HjQ.H, ] Hi. (5) 

i=i 



Then, the conjugate gradient direction can be computed as 
G (fe) = £(fc) + pkG (k-i)^ In this paper ^ we adopt the 

Fletcher and Reeves' choice of deflection [12]. The Fletcher 
and Reeves' choice of deflection can be computed as 



pk = 



G 



(fe)||2 



G 



(fe-i). 



(6) 



The purpose of deflecting the gradient using (6) is to find G^ k \ 
which is the Hessian-conjugate of G| fc X \ By doing this, we 
can eliminate the "zigzagging" phenomenon encountered in 
the conventional gradient projection method, and achieve the 
superlinear convergence rate [12] without actually storing the 
matrix of Hessian approximation as in quasi-Newton methods. 



'(fe) 



B. Projection onto fi + (P) 

Noting from (5) that Gj is Hermitian. We have that Q 

(k) (k) 

Ql + s/cG- is Hermitian as well. Then, the projection 
problem becomes how to simultaneously project a set of K 
Hermitian matrices onto the set £1+(P), which contains a 
constraint on sum power for all users. This is different to 
[11], where the projection was performed on individual power 
constraint. In order to do this, we construct a block diagonal 
matrix D = Diag{Qi . . . Q K } G c(K-n r )x(K-n r ) _ It is easy 

to recognize that if Qi G £l+(P), i = 1,...,K, we have 
Tr(D) = YJLi Tr (Qi) < P ' and D ^ 0. In this paper, we 
use Frobenius norm, denoted by || • as the matrix distance 
criterion. Then, the distance between two matrices A and 
B is defined as ||A - B\\ F = (Tr [(A - B)t(A - B)]) K 
Thus, given a block diagonal matrix D, we wish to find a 
matrix D G il + (P) such that D minimizes ||D — T)\\f- For 
more convenient algebraic manipulations, we instead study the 
following equivalent optimization problem: 



Minimize |||D-D||| 
subject to Tr(D) < P, D y 0. 



(7) 



In (7), the objective function is convex in D, the constraint 
D y represents the convex cone of positive semidefinite 
matrices, and the constraint Tr(D) < P is a linear constraint. 
Thus, the problem is a convex minimization problem and we 
can exactly solve this problem by solving its Lagrangian dual 
problem. Associating Hermitian matrix X to the constraint 
D y 0, \i to the constraint Tr(D) < P, we can write the 
Lagrangian as 



ff(X,M) 



min{(l/2)||D-D||!-Tr(XtD) 

D l - 

+ /i(Tr(D)-p)}. (8) 



Since <?(X, fi) is an unconstrained convex quadratic minimiza- 
tion problem, we can compute the minimizer of (8) by simply 
setting the derivative of (8) (with respect to D) to zero, i.e., 
(D - D) - Xt + fjl = 0. Noting that X* = X, we have 



D = D /zl + X. Substituting D back into (8), we have 
g(X, M ) = 1||X - plf F -vP + Tr [( M I - X) (D + X - M I)] 



= -i||D- / iI + X||^- A1 P + i||D|| 2 . 



(9) 



Therefore, the Lagrangian dual problem can be written as 

Maximize — |||D - fjl + X\\ F - \iP + ^\\B\\ 2 ^ 
subject to XH,/i>0. 

After solving (10), we can have the optimal solution to (7) as: 

D* = D-^i*I + X*, (11) 

where fi* and X* are the optimal dual solutions to Lagrangian 
dual problem in (10). Although the Lagrangian dual problem 
in (10) has a similar structure as that in the primal problem 
in (7) (having a positive semidefinitive matrix constraint), we 
find that the positive semidefinite matrix constraint can indeed 
be easily handled. To see this, we first introduce Moreau 
Decomposition Theorem from convex analysis. 

Theorem 1: (Moreau Decomposition [16]) Let K, be a 
closed convex cone. For x, xi,x 2 G C p , the two properties 
below are equivalent: 

1) x = xi + x 2 with xi G JC, x 2 G JC° and (xi, x 2 ) = 0, 

2) xi = p/c(x) and x 2 = pk.°{x), 

where K° = {s G C p : (s,y) < 0, Vy G JC} is called the 
polar cone of cone JC, pjt(-) represents the projection onto 
cone JC. 

In fact, the projection onto a cone JC is analogous to the 
projection onto a subspace. The only difference is that the 
orthogonal subspace is replaced by the polar cone. 

Now we consider how to project a Hermitian matrix A G 
C" x ™ onto the positive and negative semidefinite cones. First, 
we can perform eigenvalue decomposition on A yielding A = 
UDiag{Ai, i = 1, . . . ,n}U^, where U is the unitary matrix 
formed by the eigenvectors corresponding to the eigenvalues 
Xi, i = 1, . . . , n. Then, we have the positive semidefinite and 
negative semidefinite projections of A as follows: 



UDiag{max{A 4 , 0}, i = 1, 2, . 
: UDiag{min{A 4 ,0},i = 1,2,. 



,n}Ut, 
) n}U t . 



(12) 
(13) 



The proof of (12) and (13) is a straightforward application of 
Theorem 1 by noting that A + y 0, A_ ^ 0, (A+,A_) = 
0, A + + A_ = A, and the positive semidefinite cone and 
negative semidefinite cone are polar cones to each other. 

We now consider the term D — /jj + X, which is the 
only term involving X in the dual objective function. We 
can rewrite it as D — /il — (—X), where we note that 
—X < 0. Finding a negative semidefinite matrix —X such 
that ||D — /jl — (— X)||jr is minimized is equivalent to finding 
the projection of D — fil onto the negative semidefinite cone. 
From the previous discussions, we immediately have 



-X = (D 



(14) 



Since D fjl = (D ^1)+ + (D - ^I)_, substituting (14) 
back to the Lagrangian dual objective function, we have 



min||D-/xI + X|| F 



(D - ^I) + . 



(15) 



Thus, the matrix variable X in the Lagrangian dual problem 
can be removed and the Lagrangian dual problem can be 
rewritten to 

|2 



Maximize ip(ii) = 
subject to fi>0. 



■±\\(T>-nI) + \ ]F 



*H D H (16) 



Suppose that after performing eigenvalue decomposition on D, 
we have D = UAU*, where A is the diagonal matrix formed 
by the eigenvalues of D, U is the unitary matrix formed by 
the corresponding eigenvectors. Since U is unitary, we have 
(D - ^I) + = U (A - ^I) + Ut. It then follows that 



|(D-/J) + 



i2 



|(A- M I) + 



i2 

If' 



(17) 



We denote the eigenvalues in A by A^, i — 1, 2, . . . , K ■ n r . 
Suppose that we sort them in non-increasing order such that 
A = DiagjAi A 2 . . . X K n r }, where Ai > . . . > \K n r - It 
then follows that 



K- 



|(A-/iI) + || F = Y, (max{0,A, -^}) 2 . (18) 



From (18), we can rewrite if>(fj,) as 



W = -o E (max{0,A, - / i}) 2 - M P+-||D r 



(19) 



i=i 



It is evident from (19) that tp(fi) is continuous and (piece-wise) 
concave in /i. Generally, piece-wise concave maximization 
problems can be solved by using the subgradient method. 
However, due to the heuristic nature of its step size selec- 
tion strategy, subgradient algorithm usually does not perform 
well. In fact, by exploiting the special structure, (16) can be 
efficiently solved. We can search the optimal value of fj, as 
follows. Let / index the pieces of ip(u), I = 0, 1, . . . , K ■ n r . 
Initially we set I = and increase I subsequently. Also, we 
introduce A = oo and \K-n r +i = — oo. We let the endpoint 
objective value ipf(Xo) — 0, <fr* — ipf(Xo), and [i* = Ao- 
If I > K ■ n r , the search stops. For a particular index /, by 
setting 



I^(A i - M ) 2 - M P] =0, 



we have 



»f = 



(20) 



(21) 



Now we consider the following two cases: 

1) If /U*- £ [A; +1 ,A 7 -] n M+, where R+ denotes the set 
of non-negative real numbers, then we have found the 
optimal solution for /x because ip(ii) is concave in [i. 
Thus, the point having zero-value first derivative, if 



exists, must be the unique global maximum solution. 
Hence, we can let /i* = and the search is done. 
2) If n*. g [A/ +1 , Xf] n M + , we must have that the local 
maximum in the interval [A/ +1 , Aj] flK + is achieved at 
one of the two endpoints. Note that the objective value 
V>/ (Xj) has been computed in the previous iteration 
because from the continuity of the objective function, 
we have ipj (Aj) = ijjj _ 1 (A/). Thus, we only need to 
compute the other endpoint objective value ipj (A^ +1 ). 
If tpf < ip} (Xf) — 4>*, then we know [i* is the 

optimal solution; else let fi* = 4>* = ipj (A/ +1 ), 

1 = 1+1 and continue. 
Since there are K ■ n r + 1 intervals in total, the search process 
takes at most K ■ n r + 1 steps to find the optimal solution fi*. 
Hence, this search is of polynomial-time complexity 0(n r K). 
After finding /i*, we can compute D* as 



D* 



(D-/i*I) + =U(A-/i*I) + Ut. 



(22) 



That is, the projection D can be computed by adjusting 
the eigenvalues of D using ^* and keeping the eigenvectors 
unchanged. The projection of D„ onto + (P) is summarized 
in Algorithm 2. 

Algorithm 2 Projection onto Q + (P) 
Initiation: 

1. Construct a block diagonal matrix D. Perform eigenvalue decompo- 
sition D = UAljt, sort the eigenvalues in non-increasing order. 

2. Introduce Ao = oo and A_ff » t +l = —oo. Let 1 = 0. Let the 
endpoint objective value ipj (Ao) = 0, (f>* = tf>l (Ao), and /j,* = Ao. 

Main Loop: 

1. If 7 > K ■ n r , go to the final step; else let = (J2j=i Xj - P)/I- 

2. If /i*. 6 [A ! Xf] nR+, then let fi* = fi*. and go to the final step. 

3. Compute tj)j(Xj +1 ). If t(>j(Xj +1 ) < cj>*, then go to the final step; 
else let n* = Xf +1 , <f>* = tf>j(Xf +1 ), 1 = 1 + 1 and continue. 

Final Step: Compute D as D = U (A - jt*I), Ut. 



IV. Complexity Analysis 

In this section, we compare our proposed CGP with other 
existing methods for solving MIMO BC. Similar to IWFs, 
SD [8], and DD [9], CGP has the desirable linear complexity 
property. Although CGP also needs to compute gradients in 
each iteration, the computation is much easier than that in SD 
due to the different perspectives in handling MIMO BC. Thus, 
in this paper, we only compare CGP with IWF (Algorithms 1 
and 2 in [10]), which appear to be the simplest in the literature 
so far. For convenience, we will refer to Algorithm 1 and 
Algorithm 2 in [10] as IWF1 and IWF2, respectively. 

To better illustrate the comparison, we list the complex- 
ity per iteration for each component of CGP and IWFs in 
Table I. In both CGP and IWFs, it can be seen that the 
most time-consuming part (increasing with respect to K) is 
the additions of the terms in the form of H^QiHi when 
computing gradients and effective channels. Since the term 
(I + Si=i-HjQiHi) is common to all gradients, we only 
need to compute this sum once in each iteration. Thus, the 



TABLE I 

Per Iteration Complexity Comparison between CGP and IWFs 





CGP 


IWFs 


Gradient/Effec. Channel 


K 


2K 


Line Search 


O(rnK) 


N/A 


Projection/Water-Filling 


0(n r K) 


0(n r K) 


Overall 


0((m + 1 + n r )K) 


0{{2 + n r )K) 



number of such additions per iteration for CGP is K. In 
IWF1 and IWF2, the number of such additions can be reduced 
to 2K by a clever way of maintaining a running sum of 
(I + HjQjHj). But the running sum, which requires 

K 2 additions for IWF1, still needs to be computed in the 
initialization step. 

Although the basic ideas of the projection in CGP and 
water-filling are different, the algorithm structure of them are 
very similar and they have exactly the same complexity of 
0(n r K). The only unique component in CGP is the line 
search step, which has the complexity of 0(mK) (in terms 
of the additions of HjQjHj terms), where m is the number 
of trials in Armijo's Rule. Therefore, the overall complexity 
per iteration for CGP and IWFs are 0((m + 1 + n r )K) and 
0((2 + n r )K), respectively. According to our computational 
experience, the value of m usually lies in between two and 
four. Thus, when n r is large (e.g., n r > 4), the overall 
complexity per iteration for CGP and IWFs are comparable. 
However, as evidenced in the next section, the numbers of 
iterations required for convergence in CGP is much less than 
that in IWFs for large MIMO BC systems, and it is very 
insensitive to the increase of the number of users. Moreover, 
CGP has a modest memory requirement: it only requires the 
solution information from the previous step, as opposed to 
IWF1, which requires previous K — 1 steps. 

V. Numerical Results 

Due to the space limitation, we only give an example 
of a large MIMO BC system consisting of 100 users with 
rit = n r = 4 in here. The convergence processes are plotted 
in Fig. 1. It is observed from Fig. 1 that CGP takes only 29 
iterations to converge and it outperforms both IWFs. IWFl's 
convergence speed significantly drops after the quick improve- 
ment in the early stage. It is also seen in this example that 
IWF2's performance is inferior to IWF1, and this observation 
is in accordance with the results in [10]. Both IWF1 and 
IWF2 fail to converge within 100 iterations. The scalability 
problem of both IWFs is not surprising because in both IWFs, 
the most recently updated covariance matrices only account 
for a fraction of 1/ if in the effective channels' computation, 
which means it does not effectively make use of the most 
recent solution. In all of our numerical examples with different 
number of users, CGP always converges within 30 iterations. 

VI. Conclusion 

In this paper, we developed an efficient algorithm based on 
conjugate gradient projection (CGP) for solving the maximum 




Number of Iterations 



Fig. 1. Comparison in a 100-user MIMO BC channel with nt = n r = 4. 

sum rate problem of MIMO BC. We theoretically and numer- 
ically analyzed its its complexity and convergence behavior. 
The attractive features of CGP and encouraging results showed 
that CGP is an excellent method for solving the maximum sum 
rate problem for large MIMO BC systems. 
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